Fix multi-coordinate indexes dropped in `_replace_maybe_drop_dims` by rsignell · Pull Request #11286 · pydata/xarray

rsignell · 2026-04-08T12:38:02Z

@benbovy, after some discussion with @Huite, he thought Claude Code might be capable of fixing this issue, so I gave it a shot. I'm only an appreciative user of the xarray codebase, so I don't know really how to test/evaluate this beyond the passing tests that Claude itself came up with. So I hope this isn't a waste of your time. With that said, here's what Claude came up with:

When a custom Index spans multiple coordinates across different dimensions (e.g. a UGRID topology index covering both nodes and faces), it was incorrectly dropped during:

DataArray dimension reduction (e.g. .mean("extra_dim")) via _replace_maybe_drop_dims, which filtered coords by strict dim subset without consulting Index.should_add_coord_to_array()
Dataset subsetting by variable names (e.g. ds[["node_data"]]) via _copy_listed, with the same issue

Both paths now use should_add_coord_to_array() for index-backed coordinates, consistent with how _construct_dataarray already works.

Description

Checklist

Closes 11215
Tests added

AI Disclosure

This PR contains AI-generated content.
I have tested the AI-generated content in my PR (but with Claude-generated tests)
I take responsibility for any AI-generated content in my PR.

Tools: Claude Code

…_copy_listed When a custom Index spans multiple coordinates across different dimensions (e.g. a UGRID topology index covering both `nodes` and `faces`), it was incorrectly dropped during: - DataArray dimension reduction (e.g. `.mean("extra_dim")`) via `_replace_maybe_drop_dims`, which filtered coords by strict dim subset without consulting `Index.should_add_coord_to_array()` - Dataset subsetting by variable names (e.g. `ds[["node_data"]]`) via `_copy_listed`, with the same issue Both paths now use `should_add_coord_to_array()` for index-backed coordinates, consistent with how `_construct_dataarray` already works. Fixes pydata#11215 Co-authored-by: Claude <noreply@anthropic.com>

rsignell · 2026-04-08T12:41:57Z

@benboy, one more thing. I did ask Claude how if it used your suggestion in #11215 (comment) and it said:

- Cast OrderedSet to set when calling should_add_coord_to_array to satisfy the expected set[Hashable] type annotation - Replace dict comprehensions with dict.fromkeys (ruff C416) - Reformat assert messages to ruff-preferred style Co-authored-by: Claude <noreply@anthropic.com>

benbovy

Thanks @rsignell! That looks good to me.

For the case of Dataset.to_dataarray(), I think it should work similarly than in DataArray._replace_maybe_drop_dims():

coords = {}
for k, v in self.coords.items():
    if k in self._indexes:
        if self._indexes[k].should_add_coord_to_array(k, v, dims):
            coords[k] = v
    elif set(v.dims) <= dims:
        coords[k] = v

benbovy · 2026-04-09T07:50:24Z

+        assert isinstance(next(iter(reduced_node.xindexes.values())), MultiDimIndex), (
+            "Index dropped from node DataArray after reduction"
+        )
+        assert isinstance(next(iter(reduced_face.xindexes.values())), MultiDimIndex), (
+            "Index dropped from face DataArray after reduction"
+        )


Suggested change

assert isinstance(next(iter(reduced_node.xindexes.values())), MultiDimIndex), (

"Index dropped from node DataArray after reduction"

)

assert isinstance(next(iter(reduced_face.xindexes.values())), MultiDimIndex), (

"Index dropped from face DataArray after reduction"

)

for da in [reduced_node, reduced_face]:

for name in ["node_x", "node_y", "face_x", "face_y"]:

assert name in da.coords

assert isinstance(ds.xindexes[name], MultiDimIndex)

The instance(next(iter(...), ...) checks should be enough but this suggestion is to make the expected behavior clearer to the reader (or LLM).

benbovy · 2026-04-09T07:50:46Z

+        assert isinstance(next(iter(node_subset.xindexes.values())), MultiDimIndex), (
+            "Index dropped from Dataset when subsetting to node variable"
+        )
+        assert isinstance(next(iter(face_subset.xindexes.values())), MultiDimIndex), (
+            "Index dropped from Dataset when subsetting to face variable"
+        )


https://github.com/pydata/xarray/pull/11286/changes#r3056276588

rsignell · 2026-04-10T14:53:37Z

Thanks for the review @benbovy!

Here is the Claude Code plan to address your comments -- let me know if you would like changes!

Planned changes

1. Fix `Dataset.to_dataarray()` (`xarray/core/dataset.py` ~line 7163)

The current implementation includes all coords unconditionally without calling should_add_coord_to_array, so multi-coordinate indexes can be dropped via filter_indexes_from_coords. Will fix to mirror the pattern in _copy_listed / _construct_dataarray:

needed_dims = set(broadcast_vars[0].dims)
coords: dict[Hashable, Variable] = {}
for k, v in self.coords.items():
    if k in self._indexes:
        if self._indexes[k].should_add_coord_to_array(k, v.variable, needed_dims):
            coords[k] = v.variable
    elif set(v.dims) <= needed_dims:
        coords[k] = v.variable
indexes = filter_indexes_from_coords(self._indexes, set(coords))

2. Improve test assertions in `test_dataarray.py` (lines 586–591)

Replace next(iter(...)) with the explicit loop you suggested (using da.xindexes[name]):

for da in [reduced_node, reduced_face]:
    for name in ["node_x", "node_y", "face_x", "face_y"]:
        assert name in da.coords
        assert isinstance(da.xindexes[name], MultiDimIndex)

3. Improve test assertions in `test_dataset.py` (lines 4585–4590)

Same pattern applied to the dataset test:

for subset in [node_subset, face_subset]:
    for name in ["node_x", "node_y", "face_x", "face_y"]:
        assert name in subset.coords
        assert isinstance(subset.xindexes[name], MultiDimIndex)

4. Add a new test for `to_dataarray()` with multi-coordinate indexes

Add test_to_dataarray_preserves_multi_coord_index in test_dataset.py using the same MultiDimIndex fixture pattern.

[This is Claude Code on behalf of Rich Signell]

benbovy · 2026-04-10T15:57:03Z

At a first glance the the proposed changes look good!

- Apply should_add_coord_to_array() in Dataset.to_dataarray() for consistent handling of multi-coordinate indexes (per benbovy review) - Add test_to_dataarray_preserves_multi_coord_index to cover this path - Simplify isinstance(next(iter(...))) checks to cleaner for-loops in both test_dataarray.py and test_dataset.py (per benbovy suggestion) Co-authored-by: Claude <noreply@anthropic.com>

rsignell · 2026-04-10T19:57:03Z

@benbovy, okay, Claude thinks it addressed the comments. Fingers crossed! 😄

benbovy · 2026-04-13T07:07:43Z

        variable = Variable(dims, data, self.attrs, fastpath=True)

-        coords = {k: v.variable for k, v in self.coords.items()}
+        broadcast_dims = set(broadcast_vars[0].dims)


Actually, a fix is not even necessary here since Dataset variables are broadcasted against each other and thus all coordinates are kept in the returned DataArray anyway, like mentioned in the docstrings. I added to_dataarray() to the list as I naively looked at all places where filter_indexes_from_coords is called, sorry.

(filter_indexes_from_coords is not even needed, copying self._indexes should be enough).

Per reviewer feedback, no fix was needed in to_dataarray() since Dataset variables are broadcasted and all coordinates are kept anyway. Replace the should_add_coord_to_array loop and filter_indexes_from_coords call with a simple dict copy of all coords and indexes. Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com>

rsignell · 2026-04-13T16:55:16Z

@benbovy, I asked Claude to double-check if the issue was addressed:

benbovy · 2026-04-13T18:24:44Z

LGTM too, thanks!

welcome · 2026-04-13T18:27:50Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

…ydata#11286)

benbovy reviewed Apr 9, 2026

View reviewed changes

benbovy reviewed Apr 13, 2026

View reviewed changes

rsignell and others added 2 commits April 13, 2026 12:36

Add whats-new entry for multi-coord index fix (pydata#11215)

85fb34f

Co-authored-by: Claude <noreply@anthropic.com>

benbovy merged commit a635d86 into pydata:main Apr 13, 2026
43 checks passed

rsignell mentioned this pull request Apr 13, 2026

Holoviews / Datashader integration Deltares/xugrid#204

Open

jsignell pushed a commit to jsignell/xarray that referenced this pull request Apr 15, 2026

Fix multi-coordinate indexes dropped in _replace_maybe_drop_dims (p…

296262e

…ydata#11286)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multi-coordinate indexes dropped in `_replace_maybe_drop_dims` #11286

Fix multi-coordinate indexes dropped in `_replace_maybe_drop_dims` #11286
benbovy merged 5 commits intopydata:mainfrom
OpenScienceComputing:fix/multi-coord-index-drop

rsignell commented Apr 8, 2026 •

edited

Loading

Uh oh!

rsignell commented Apr 8, 2026

Uh oh!

benbovy left a comment

Uh oh!

benbovy Apr 9, 2026

Uh oh!

benbovy Apr 9, 2026

Uh oh!

rsignell commented Apr 10, 2026 •

edited

Loading

Uh oh!

benbovy commented Apr 10, 2026

Uh oh!

rsignell commented Apr 10, 2026

Uh oh!

benbovy Apr 13, 2026

Uh oh!

rsignell commented Apr 13, 2026

Uh oh!

benbovy commented Apr 13, 2026

Uh oh!

Uh oh!

welcome bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rsignell commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

AI Disclosure

Uh oh!

rsignell commented Apr 8, 2026

Uh oh!

benbovy left a comment

Choose a reason for hiding this comment

Uh oh!

benbovy Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

benbovy Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

rsignell commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Planned changes

1. Fix Dataset.to_dataarray() (xarray/core/dataset.py ~line 7163)

2. Improve test assertions in test_dataarray.py (lines 586–591)

3. Improve test assertions in test_dataset.py (lines 4585–4590)

4. Add a new test for to_dataarray() with multi-coordinate indexes

Uh oh!

benbovy commented Apr 10, 2026

Uh oh!

rsignell commented Apr 10, 2026

Uh oh!

benbovy Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

rsignell commented Apr 13, 2026

Uh oh!

benbovy commented Apr 13, 2026

Uh oh!

Uh oh!

welcome bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rsignell commented Apr 8, 2026 •

edited

Loading

rsignell commented Apr 10, 2026 •

edited

Loading

1. Fix `Dataset.to_dataarray()` (`xarray/core/dataset.py` ~line 7163)

2. Improve test assertions in `test_dataarray.py` (lines 586–591)

3. Improve test assertions in `test_dataset.py` (lines 4585–4590)

4. Add a new test for `to_dataarray()` with multi-coordinate indexes